A polyphase filter for many-core architectures
نویسندگان
چکیده
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFlop/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.5× to 1.92× greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
منابع مشابه
Comparison of Decimation Filter Architectures for a Sigma-Delta Analog to Digital Converter
Different decimation filter architectures are compared for integration with an existing second order sigma-delta modulator to form a sigmadelta ADC. The decimation filters are implemented using a third order cascaded integrator comb filter programmed to work for oversampling ratios of 64, 128 and 256. IIR-FIR, non-recursive and polyphase architectures of decimation filters are simulated and imp...
متن کاملSpecification of APERTIF Polyphase Filter Bank in ClaSH
CλaSH, a functional hardware description language based on Haskell, has several abstraction mechanisms that allow a hardware designer to describe architectures in a short and concise way. In this paper we evaluate CλaSH on a complex DSP application, a Polyphase Filter Bank as it is used in the ASTRON APERTIF project. The Polyphase Filter Bank is implemented in two steps: first in Haskell as bei...
متن کاملThe Design of Active Polyphase Filter
The fast growth of wireless applications in recent years has driven intense efforts to design highly integrated, high-performance, low-cost RFICs. Both low-IF [33]-[37] and double-quadrature [38]-[40] architectures have been adopted as promising receiver topologies to realize these design goals because they combine the advantages of heterodyne and direct-conversion architectures. In the low-IF ...
متن کاملMultirate digital filters for symbol timing synchronization in software defined radios
This paper describes the use of a polyphase filterbank to perform the interpolations required for symbol timing synchronization in a sampled-data receiver. The polyphase filterbank possesses advantages over architectures based on separate matched and interpolation filters. Interpolations are realized by filterbank index selection and a separate interpolating filter following the matched filter ...
متن کاملVLSI Architecture for Forward Discrete Wavelet Transform Based on B-spline Factorization
Based on B-spline factorization, a new category of architectures for Discrete Wavelet Transform (DWT) is proposed in this paper. The B-spline factorization mainly consists of the B-spline part and the distributed part. The former is proposed to be constructed by use of the direct implementation or Pascal implementation. And the latter is the part introducing multipliers and can be implemented w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.03599 شماره
صفحات -
تاریخ انتشار 2015